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Packet-switching computer communication networks are exam 


scale emergence of mini and micro-computers, 
distributed systems. However, 
_, devised to operate such distri 


ples of distributed systems. With the large 


it is now possible to design special or general purpose 
4s new problems have to be solved, new techniques and algorithms must be 
buted systems in a satisfactory manner. In this paper, basic characteris- 


tics of distributed systems are analysed and fundamental principles and definitions are given. It is 
shown that distributed systems are not just simple extensions of monolithic systems. Distributed control 
techniques used in some planned or existing systems are presented. Finally, a formal approach to these 
problems is illustrated by the study of a mutual exclusion scheme intended for a distributed envi- 


ronment. 


4. INTRODUCTION 

aa) 
Computer communication networks using packet-swit- 
ching technology provide for the interconnection of 
data-processing equipments of any kind. Such systems, 
sometimes simply referred to as computer networks, 
may be viewed as multi-macroprocessors whenever the 
goals of resource-sharing are achieved. With the 
large-scale emergence of mini and microcomputers, it 
is now possible to envision building general or spe- 
cial purpose multimini and multimicrocomputers to be 
operated in a non-centralized manner. The need for 
automatic resource-sharing arises here as in a simi- 
lar way it does for multimacroprocessor systems. 


Two kinds of resources must be considered : 

- system resources, multi-accessed by users and for 
which multiplexing is required (hidden sharing) 

- user resources, which users agree to share accord- 
ing to some protocol of their own (explicit sharing). 


This paper discusses the prablems of system resource- 
sharing in a distributed environment. An example of a 
user-sharing problem is distributed data-base sharing. 


2, OISTRIBUTED SYSTEMS-ELEMENTS FOR A FORMAL APPROACH 
Experimental and public packet-switching computer 
communication networks have been built and operated 
since 1968 ; examples are Arpanet, [7], Cyclades, 
(13], EIN, [1], Telenet, [17] and Datapac, [4]. The 
communication subnet of these networks is an example 
of a distributed system : all nodes have equal 

rights and responsabilities and no central "con- : 
troller” is needed for the subnet to switch packets. 
Other examples are multicomputers like OCS {6] and 
Pluribus [8]. Finally, some multimicroprocessors 
Currently planned by manufacturers will include “dis- 
tributed features”, in particular, those designed to 
have high availability characteristics. 


A definition of what is meant by distributed system 
18 needed. Then, analysis of the fundamental proper- 
ties of computer systems makes it possible to tell 
wheather or not a given system has distributed fea- 
tures. 


2.4. Definitions. 


In.@ gamputer system, system resources sre not. 1 
Sccessed as auch by users but through a set of ser-. 
vices usually referred to as an operating system... 

Exemples of services which we call operating func- 

‘tions,are.: communication, user scheduling, resource: 
Allocation, hardware resource handling, system date . 
manegenent; <<. These functions. are run through. pro-. 
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cesses called logical entities. 


Let F = {fy, iel} be the set of the opereting func- 
tions and Ey = (e4. jeJ(i)} be the set of entities 
participating in fuction f;. At any instant t, it is 
possible to, define s¢(e4) as the instantaneous state 
of entity ey. It is therefore theoretically possible 
to define the global state of Ey at instant t as the 
vector S¢(Ey) = {..., se(éj), «+. for all jeJli)}. 


A system will be said to be f,-centralized i 
exists keJ(i) such that S_(Ey) is known to @. 


there 


An example is a system in which dim(E,) = 1 3 another 
example is a system in which entities run the ope- 
rating function f; by using a common "system table”. - 


A system will be said to be totally centralized if 
it is f,-centralized for any ieI. 


In contrast, a system will be said to be fy-distri- 
buted if there doss not exist keJ(i} such that 
S$, (Ey) is known to ef. 


A system will be said to be totally distributed if it 
is fy-distributed for any iel. 


Typical cases of distributed systems are systems in.- 
which cooperating entities do not share the seme phy: 
sical space and/or do not have a common time referen- 
ce. In such systems, an entity may get. either 4 per- 
tial and coherent view of the system or a complete:..; 
but incoherent view of the system, coherence..meaning. 
that observations are made at the same moment: in the- 
system (absolute time). This absence of uniqueness 
both in time and apace hes very important conse- — »- 
quences. 


2.2 Time and space 


It is well known in Physics that the sentence “event 
— occured at time t” is meaningless if there is no 
indication about which time reference is used. Simi- 
larly, tha statement "I am in location 1” ‘Raa no 
meaning 48 long as @ space reference ahd a.time re-. 
ference have not been defined. Then, tp. the purpose 
of describing the behaviour of an entity, we will use 
4 precise ‘time-space reference where internak states, 
time lengths, nemes and instents nay be defiriad. .. 


We define the absolute Reference as the ideal refe- 
rence such thet speed of communication between this 
reference and any other time-space reference is infi- 
nite and where evary space location may be given a 
unique name. ; 
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(i) System properties 


An entity may be viewed as a finite-state automaton: 
a decision to switch to a new state is possible 
through the observation of a specific sequence of 
events received during s time period (t-a, t), mea- 
sured in the local time reference. 


ropert : a is a finite value 
property M : there are several time-space references 
for entities in the system 


All systems studied here are assumed to have proper- 
ties Q and M. 


The propagation delay between two entities is the 
time needed, as measured in the absolute Reference, 
to transmit an elementary signal from one entity to 
the other. 


property P) : for any pair of entities, propagation 
delays are fixed, finite and known with absolute 
accuracy 3 they may be different for each pair 
roperty Py : propagation delays are variable, fini- 
te and their values are not known with absolute 
accuracy 
property P3 : propagation delays are variable, fini- 
te and known a posteriori with absolute accuracy 
property Py : propagation delays are variable but 
their values are upper bounded. 


We should mention that these properties are common 
to all systems that are spacially distributed with 
finite propagation delays, including, for example, 
conventional logic design. Formalization of these 
properties was felt necessary so as to infer from 
them, basic principles which should be useful to 
distributed system designers. 


(14) Classification 

- Systems with properties 9, M and P) will be called 
Perfect Multireference Systems (PMS) 

- Systems with properties Q. M and P3 will be called 
Quasi-Perfect Multireference Systems (QPMS) 

- Systems with properties Q, M and P2 will be called 
Multireference Systems (MS). 


Let V be a sequence of events occuring within a sys- 
tem s let E be thb set of entities observing V. E- 
vents may be observed in two ways : referenced wi- 
thin a time-space reference Ry, ieI, with I being 
the set of the time-space references of E or refe- 
renced within the absolute Reference. It may be in- 
teresting to consider the following problems : 

(a) is it possible to build in Ry, for any jel, the 
absolute chronological ordering of events (as obser- 
ved in the absolute Reference) ? 

(b) do all the entities of E observe identically the 
set of events V ? 


Answers to these questions ere given in table 1. 


Table 1 





* if the value a (property Q) is the same for all 
entities in the system. 


A graphicel] representation of properties P2, P3 and 
P, may help to answer questions (a) and (b) 3 an 
exemple is given in fig. 1. 


far aaa 1 absolute 


time In A 






absolute 


time in B 


absolute 


time {nc 


Fig. 1 


In this example, three time-space references are re. 
presented. Three events X, Y, Z originated in B aust 
be reported to A and C. It is easy to see that there 
may exist cases for which A, B and C will not be in 
agreement s for instance, C may miss the observation 
of event Z, whatever the finite value of u+v. Thus, 
if time is the only dimension used to achieve coope- 
ration between entities, systems having property Py 
are identical to Multireference Systems. 


(111) Principles for non-perfect systems (QPMS and 


(a) Principle of time nondeterminancy 


For any given sequence of events, it is impossible to 
prove that two different entities will observe iden- 
tically this sequence of events. An example of an e- 
vent is a change of internal state ; as these changes 
are observable from the outside only through communt- 
cation, one may state another principle : 


(b) Principle of relativistic observation 


In a non-perfect system, it is impossible to prove 
that any two entities will have the same global view 
of a given subset of the system. ; 


As @ consequence, tha environment of any entity 
cannot perceive the pair (t, e(t)) with an absolute 
certainty, t being the entity local time and a(t) 
the current internal state of the entity. It will be 
possible either to know e(t) and to stata that the 
entity will reach thet state in a predefined time in- 
terval At with some probability or to pick up an ins 
tant t and not being able to associate with certainty 
a given state a(t). Then we have the following prin- 
ciple 


(c) Principle of state nondeterminancy 


The instantaneous state of an entity may only 
pressed in terms of possible values associate 
some probabilities. 


be ex 
d with 


This means that the global state of a non-perfect 
system does not exist. As a consequence, according | 
to the definition, these systems are totally distr 
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buted systems. 


It should be pointed out that if we agree to consi- 
der infinite values for propagation delays, then the 
above principles are valid a fortiori. 


In fact, reunion of properties Pz and Q imposes that 
entities somatimes have to consider a propagation de- 
lay as being infinite. This may, or may not be in 
accordance with reality. A propagation delay is tru- 
ly infinite when the signal never reaches the desti- 
nation entity; this may happen for two reasons : 

- antities in charge of communication handling 

failed 

- the destination entity failed (was excluded from 
the system) before receiving the signal. 


In some cases, failure-tolerant techniques are manda- 
tory. There is no essential difference with techni- 
ques intended for distributed systems. An example of 
this approach is given in 4. 


As was pointed out before, system resources are 
accessed through operating functions. For every func- 
tion, the decision about how and when to run the re- 
lated entities is what is called control. Obviously, 
in a centralized system, control is performed by the 
“central” entity for that function (ed in the defi- 
nition). According to the basic principles, problems 
to be solved with distributed systems are ; 


- control must be achieved without knowledge of the 
global state. Therefore, what is needed is that each 
entity behaves according to some algorithm working 
on an approximation of this state. In spite of this 
uncertainty, these algorithms must be such that en- 
tities are kept in a legitimate state and that the 
global system behaviour is a convergent process 


- there is no entity which is, a priori, in charge of 
performing the control. Then, one is left with the 
problem of designing a common and secure protocol 
providing for the distribution of control among en- 
tities. It must also be proved that algorithms and 
protocols do not lead to inconsistancies and dead- 
locks. 


According to the basic principles, time should not be 
used to achieve synchronization between entities. 


It it now possible to refine our terminology : what 
is called a distributed system in this paper is any 
non-perfect multireference system with distributed 
control for some of the operating functions. 


Examples of solutions to the above problems will now 
be presented. 


3. EXAMPLES OF DISTRIBUTED SYSTEM TECHNIQUES 


3.1 Congestion control arid end-to-end flow control 
in computer communication networks 


End-to-end flow control allows for monitoring the 
transmission of data between a source entity and a 
destination entity, a specific sub-system being in 
charge of performing the real transmission. The pur- 
Pose of congestion control is to guarantee that the 
resources of the transmission sub-system are always 
used efficiently, i.@. that fair sharing of the sub- 
system between several source-to-destination flows 
is possible and that deadlocks may be either avoided 
or suppressed. Referring to the introduction and 
thinking in terms of hierarchical systems, end-to- 
end flow control is a problem of user resource- 
sharing at a given level, whereas congestion control 
ig a problem of system resource-sharing at the next: 
level, this being true for any level-in the hierar- 
chy. Similar views are discussed in detail in [15]. 


(1) Congestion control 


Among the various existing schemes, isarithmic con- 
gestion control is one for which extensive analysis 
has been performed, [16]. A constant amount of per- 
mits is maintained over the network ; the algorithm 
for the entities is the following one : a new input 
packet is accepted only if the local permit value is 
greater than zero 3; when a packet reaches its desti- 
nation, the entity may either store the correspond- 
ing permit or ship that credit to another entity de- 
pending on whether or not a given threshold value 
has been reached for the amount of local permits. 


Here, control is completely and permanently distri- 
buted over the entities. Isarithmic control appears 
as an efficient mechanism for avoiding global con- 
gestion. Nevertheless, this technique does not ful- 
fill the requirement of resistance to entity failu- 
res and no solution has been proposed as yet for 
controlling losses of permits. 


Most of congestion control mechanisms are uniform, 
in the sense that they do not discriminate between 
traffic sources. Some mechanisms, for example CLL in 
Cigale, [14], attempt to intervene only on those 
traffic sources which contribute to the load. 


(11) End-to-end flow control 


The purpose of end-to-end flow control is to monitor 
data transmission between two entities such that the 
resources of these entities are properly utilized 

and the entities are kept synchronized. Because of 
different physical spaces and time references, exis- 
tence of errors and duplication in transmission, tra-~ 
ditionnal mechanisms like the producer-consumer sche- 
me are not practicable ; this view is consistent : 
with the conclusions of a recent analysis described 
in [18]. 


A basic scheme which is now widely accepted and was 
first introduced in Cyclades [3] is the Window mecha- 
nism, [2, 19]. Messages to be transmitted are sequen- 
tially numbered by the sender. This allows the re- 
ceiver to detect loss and duplication of messages. 
The flow of data is acknowledged up to the first 
missing message. Together with the acknowledgment 
number (a), a credit value (c) is returned to the 
sender meaning that transmission of messages is : 
allowed up to number (atc). It can easily be shown. 
that the Window mechanism is resistant to erroneous, 
lost or duplicated messages. 


3.2 Load-sharing in_ multicomputer systems 


The operating function considered here is- processor- 
assignment in a system where processors are: logi-. 
cally equivalent, i.e. they al1-have.the same .capa- 
bilities. Entities are processes responsible: for 
computing processor load and running the lJoad- 
sharing algorithm. System-wide throughput and res- 
ponse time are optimal when the load is equally disa- 
tributed over the processors. d fo : 


Automatic load sharing should only be performed by 
the entities themselves and not by the external re- 
questors., This provides for the necaseary indepen-. 
dance between different logical levels in the sys- 
tem. As a consequence, processor failures, ‘extension/ 
reduction of the configuration and changes in the .- 
hardware topology are totally irrelevant: to the ”:-- 
other system levels. os > 


Two methods for achieving automatic ‘load-sharing 
are now presented. : ra Mince “ag ; 


(1) Diffusion technique 


This technique is similer to adaptive routing mechar 
nisms. For avery entity, the notion. of a neighbour... 
is defined ; regularly, entities exchange a load- 
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vector V = {1, 4} with their neighbours, 1 being the 
minimum value of the loads most recently received by 
the entity and i being the identity of the corres- 
ponding processor. Then, at any moment, upon receiv- 
ing an external request, an entity is able to route 
it immediately to the less loaded processor. Stabi- 
lity conditions must be computed according to herd- 
ware performances and processing requirements. 


(24) ‘Ciroulating vector technique 


A successor is defined for each entity, such that 
all entities are located on a virtual ring. On this 
ring. one or several load vectors circulate, the di- 
mension of the vectors being equal to the number of 
entities. The individual algorithm simply consists 
of each entity updating its own component with the 
current load value upon receiving a vector, record- 
ing e copy of it and transmitting this vector an ta 
the ring. Notice that loss of a vector or creation 
of several vectors is not catastrophic to the system. 


The efficiency of these techniques has been ava-i' 
luated by means of a simulation model. Some results 
may be found in [10]. These mechanisms may be used 
as they stand to distribute the load evenly in a 
system or they may be used in connection with some 
other mechanisms when it is necessary to take loca- 
lity constraints into account. 


3.3 Distributed allocation of resources 


The problem is the following one : U-entities (users) 
must be allocated R-entities (resources) ; specific 
entities are in charge of multiplexing several U- 
entities and performing the resource allocation (in 
a system described somewhere else [9], these enti- 
ties are called controllers). A communication sub- 
system is used by the controllers to send their re- 
quests directly to R-antities ; how should deadlocks 
be avoided ? From several techniques, the circu- 
lating control token scheme is now presented. 


For every controller, a successor is defined such 
that controllers are located on a virtual ring. One 
representation of a n-controller ring may be {i+i+1, 
modulo n, ie(O,n-1)}, each integer i being the iden- 
tity of a controller. Asynchronous and natural time 
division is achieved by the means of a unique con- 
trol token circulating on this virtual ring ; a con- 
troller is allowed to send allocation requests only 
when it owns the control token ; R-entities are pro- 
vided with waiting files in which requests are re- 
corded, up to a pre-defined limit (congestion con- 
trol). When all requests have been answered, the con- 
trol token is transmitted to the successor on the 
ring s later, the U-entity will receive a message 
from each requested R-entity indicating that it has 
now moved to the first position in the file and that 
utilization of the resource is allowed. The average 
number of R-entities to be requested at a time is 
not independant of the circulating speed of the con- 
trol token and it influences directly the system- 
wide job throughput. Extensive simulation described 
in [12] has been performed to evaluate the perfor- 
mances of this technique. 


This is an example of a technique which must be 
shown to survive failures ; of major concern is the 
guarantee that the control token (CT) is never lost 
and that there is only ane token on the ring. A pro- 
tocol fulfilling this requirement exists and is now 
presented. 


4, A SECURE PROTOCOL TO ACHIEVE MUTUAL EXCLUSION IN 
A DISTRIBUTED SYSTEM 


We will discuss problems related mainly to contral- 
ler failures such as what we should do when the con- 
troller which owns the CT goes down and thus removes 
the CT from the ring ? 


4.1 Ring failures 


First, assume that an error control mechanism based 
on “life massages”, (7, 9], is provided at the hard- 
ware level which allows for the virtual ring recon- 
figuration. Then, controller i may be temporerily 
excluded from the ring, the successor of controller 
i-1 being controller i+1. 


Second, let us briefly discuss problems related to 
failures of the interconnection structure (unibus, 
multibus, digitel loop, multidrop telephone line, 
redio/satellite communication channel, matrix switch, 
store-and-forward networks,...)}. Transmission errors 
ere easily recovered by using a simple mechanism 
like the Window technique. If the structure does not 
provide for more than one physical path between any 
pair of controllers, than it is of no help to dasign 
a failure tolerant distributed protocol ; it the 
structure does 30, then failure of a structure sub- 
set can be controlled and recovered by using well 
Known techniques like adaptive routing or alternate 
fixed routing. 


4.2 The protocol 


The protocol consists of a precedence rule and an 
election phase algorithm. 


(1) Hypothesis 


- controllers identities are integer values : (0, 
n-1)} for n controllers (H1) 

- controllers may access the header of messages cir- 
culating on the ring 

- the CT and other tokens are empty messages (only 
the header) 

- each controller owns a timer ; this timer is reset 
at each CT occurence 

- the system is asynchronous ; timer values are not 
necessarilly identical ; if they are, no consequence 
can be drawn from this (principle of time nondeter- 
minancy) ; each controller is provided with its awn 
time clock {H2) 

- the sequence of messages on the ring is unchanged, 
i.e. messages received by a controller are retrans- 
mitted FIFO 

- creation of a CT is an instantaneous action 

- when timer awakes, the cantroller generates ins- 
tantaneously a token carrying its own identity 5 
this token is candidate for being the new CT 


(ii) Precedence rule 


A controller which has generated a token and which 
receives the CT before its own token has completed & 
period must remove this token from the ring. 


Indeed, “early” generation of a token may be due to 
large round trip times for the CT, short timer var 
tues and so forth. In any case, as a CT is circu~ 
lating on the ring, there is no need for local ac- 
tion. ; 


(i141) Election phase algorithm 


The problem is to design an algorithm such that one 
can prove that, when the control token (CT) is lost. 
there is a unique controller to be elected as res~ 
ponsible for regenerating a new CT (constraint A), 
in a finite time delay (constraint B). 


Let I be the set of controllers participating in the 
election i.e. controllers for which timers awake 
between the loss of CT and regeneration of a new CTs 
let S(i), ieI, be the set of tokens identities 48 
recorded by controlleri after a complete rotation of 
token i ; obviously, one of these identities will be 
i itself. 


Uniqueness in the choice for several controllers is 
guaranteed if : 


condition (a) : the algorithm is unique for all con- 
trollers 

condition (b) : value of S(i), ieI is the same for 
all controllers. 


Unfortunately, there are cases for which condition 
(b) is not true, see fig. 2; Tj being generated af- 
ter Ty crossed controller j and before T, crossed 
the same controller, one is left with a situation 
where S(i) = {i,k} and S(k) = {4.j,k}. 


One may be tempted to solve the problem by using one 
of these solutions 

Solution 1 : for each token crossing a controller, 
the local timer is reset to its initial value; a 
new token is generated locally only when the timer 
awakes. 

Solution 2 : solution 1 plus : each controller must 
remove from the ring any token circulating after the 
first token so that only that one will be left on 
the ring. 


It should be noted that in this case, all control- 
jlers, even those not participating in the election 
(which means permanently), are required to record 
crossing of any token. It is not difficult to con- 
clude that these solutions are not acceptable, becau- 
ge of H2 and condition (b). 







i {controller (controller) 


qT : token of controller x 
Ys end un have completed a revolution 
Fig. 2 


As the set of controllers is strictly ordered (H1), 

6 simple algorithm may be proposed : 

MN: if £ + min S(1}, then entity 1 immediately gene- 
rates the new CT 


In order to demonstrate that Q satisfies constraint 
A, let us describe first the state-transition table 
of entity 1 on the ring (table 2). 
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Comments : 

O : awaking of the timer 

4: reception of the control token 

2 +: reception of a candidate token. the identity of 
which is smaller than i 

3. : reception of a candidate token, the identity of 
which is larger than i 

4 : reception of the candidate token i (after one 
complete revolution) 

a : idle, control token timer is set 

8 +: candidate token timer is set and i is prepared 
to regenerate a new control token 

% : candidate token timer is set and i is not res- 
ponsible for the control token regeneration 

a* : generation of the new control token and imme- 
diate switching to state a 


We will use the following notation : 

I(cT,x} = instant of control token reception by en- 
tity x 

I(t(x),y) = instant of reception of the candidate to- 
ken x by entity y 

I(x,0) = instant of ganeration of a candidate token 

by entity x 
I(x,x) = occurence of event 4. 


Let us imagine that two entities x and y generate 
“simultaneously” (during the same revolution on the 
ring) a control token, thus violating constraint A 
and we will show that this situation is impossible. 
Let us assume, for example, that identity (x) < iden- 
tity ly). 


Entity y will generate a new token if and only if 
state (y) at time I(y,y) is Bs this implies : T and 
2 between I(y,o) and I(y,y) where n means non occu- 
rence of event n. 


Identically, assuming that x will generate 4 new to- 
ken implies (x < y) : 1 between I(x,0) and I(x.x). 


It is easy to show that a subset of these conditions 
leads to a contradiction. 


2 between Ily,o) and Iy.y) T(t(x),y) > Ify.y) for 
entity y : 7 between I(x,0) and I(x,x) I(LT,x) > 
I(x,x) with the CT received by x being the token ge- 
nerated by y. This constraint and the FIFO hypothe- 
sis imply that for entity y : I(t(x),y) < Ify.y}. = 


It has thus been demonstrated that the CT cannot be 
generated by two different entities during one revo- 
lution on the ring. ee 


Constraint 6 is obviously fulfilled by Q end it is 
possible to compute bound values for tha time T 
needed to regenerate a new CT. If x is the identity 
of the first controller to initiate the election pha- 
se after loss of the CT and if 6 is the maximum value 
of the time required for a controller to process a 
token and to hand it down tao its neighbour on the 
ring, then we have : or 


R¢T<¢x(R - 6) + 8 


T being counted from the instant of timeout for con- 
troller x. 


(iv) Failures during the election phese 


When considering the failure of a controller partici- 
pating in the election phase, two problems must be 
tackled. Failure of the controller which is precisely 
the one being elected by the other controliers as 
responsible for generating the new CT is nat catas- 
trophic s protection against infinite waiting is pro- 
vided by timers s the election phase will only be 
longer then for a failure-free situation. - 


The other problem is what to. do with tokens generated 
by controllers which have feiled before tokens have 
completed 4 period. One protocol may be that only 
controller 1 is allowed to remove T, from the ring. 
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This is acceptable provided that failures are either 
exceptional or of short duration. Otherwise, other 
controllers must be allowed to destroy tokens per- 
taining to dead controllers 1 for instance, the last 
elected controller may be responsible for this. 


Another solution is based on mutual help and mutual 


suspicion principles being applied in the whole sys- 
tem. 


When a controller failure is detected, its neigh- 
bours help to “clean” the situation. One of the ac- 
tiona to be taken could be precisely to withdraw 
from the ring @11 tokens and messages generated by 
this failing controller. These actions are then de- 
pendant upon an error control mechanism situated at 
another logical level in the system. 


Problems related to ring reconfiguration after a 
failure, reintegration of a controller on the ring 
a8 well -as-a second protocol achieving mutual exclu- 
sion are analysed in detail in [11]. 


We should mantion the practical utility of the elec- 
tion protocol. Controllers are autonomous not only 
while the system is running but also at the initia- 
lization phase. No external action is needed as con- 
trollers will undertake spontaneously an election 
when the system is turned on. In this sense, this 
approach is different from the one described in’ [5] 
as here there is no need for a stabilization to be 
achieved after initialization. 


5. CONCLUSION 


* Because of hardware technology trends, distributed 
systems are receiving more and more attention. Im- 
portantly, this kind of system seems to fulfill user 
neéds..more satisfactorily and more easily than con- 
ventional and centralized systems : as processors 
may migrate, people do not have to, fully modular 
aystems are easier to maintain, ta expand and so 
forth. In this paper, an attempt has been made to 
clarify the concept of distributed system 3; the na- 
‘ture of such systems has been analysed, definitions 
and design principles have been given and specific 
techniques have been presented and discussed. 
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